An Overview of Heuristic Knowledge Discovery for Large Data Sets Using Genetic Algorithms and Rough Sets

نویسنده

  • Alina Lazar
چکیده

Uninformed or blind search, which processes and evaluates all nodes of a search space in the worst case, is not realistic for extracting knowledge from large data sets because of time constraints that are close related to the dimension of the data. Generally, the search space increases exponentially with problem size thereby limiting the size of problems which can realistically be solved using exact techniques such as exhaustive search. An alternative solution is represented by heuristic techniques, which can provide much help in areas where classical search methods failed. The word "heuristic" comes from Greek and means "to know", "to find", "to discover" or "to guide an investigation". Specifically, "Heuristics are techniques which seek good (near-optimal) solutions at a reasonable computational cost without being able to guarantee either feasibility or optimality, or even in many cases to state how close to optimality a particular feasible solution is." (Russell, Norvig, 1995) Heuristic refers to any techniques that improves the average-case performance on a problem-solving task but does not necessarily improve the worst case performance. Heuristic techniques search the problem space "intelligently" using knowledge of previously tried solutions to guide the search into fruitful areas of the search space. Often, search spaces are so large that only heuristic search can produce a solution in reasonable time. These techniques improve the efficiency of a search process, sometimes by sacrificing the completeness or the optimality of the solution. Heuristics are estimates of the distance remaining to the goal, estimates computed based on the domain knowledge. The domain knowledge provides help to heuristics in guiding the search and can be represented in a variety of knowledge formats. These formats include patterns, networks, trees, graphs, version spaces, rule sets, equations, and contingency tables. With regard to heuristics there are a number of generic approaches such as greedy, A* search, tabu search, simulating annealing, and population-based heuristics. The heuristic methods can be applied to a wide class of problems in optimization, classification, statistics, recognition, planning and design. Of special interest is the integration of heuristic search principles with the dynamic processes in which data becomes available in successive stages, or where

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heuristic Knowledge Discovery 1 Running head: Heuristic Knowledge Discovery, Genetic Algorithms and Rough Sets Heuristic Knowledge Discovery for Archaeological Data Using Genetic Algorithms and Rough Sets

The goal for of this research is to investigate and develop heuristic tools in order to extract meaningful knowledge from archeological large-scale data sets. Database queries help us to answer only simple questions. Intelligent search tools integrate heuristics with knowledge discovery tools and they use data to build models of the real world. We would like to investigate these tools and combi...

متن کامل

A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)

Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...

متن کامل

Fuzzy-rough feature selection accelerator

Fuzzy rough set method provides an effective approach to data mining and knowledge discovery from hybrid data including categorical values and numerical values. However, its time-consumption is very intolerable to analyze data sets with large scale and high dimensionality. Many heuristic fuzzy-rough feature selection algorithms have been developed however, quite often, these methods are still c...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Rough sets theory in site selection decision making for water reservoirs

Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004